{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ur8xi4C7S06n"
      },
      "outputs": [],
      "source": [
        "# Copyright 2025 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JAPoU8Sm5E6e"
      },
      "source": [
        "# Serving Gemma 3 with Ollama on Cloud Run\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/cloud_run_ollama_gemma3_inference.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fopen-models%2Fserving%2Fcloud_run_ollama_gemma3_inference.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/open-models/serving/cloud_run_ollama_gemma3_inference.ipynb\">\n",
        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/cloud_run_ollama_gemma3_inference.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/cloud_run_ollama_gemma3_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/cloud_run_ollama_gemma3_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/cloud_run_ollama_gemma3_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/cloud_run_ollama_gemma3_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/cloud_run_ollama_gemma3_inference.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>            "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "83b98b0ba19c"
      },
      "source": [
        "<img src=\"https://ollama.com/public/ollama.png\" height=\"200px\" alignment=\"center\"/>\n",
        "<img src=\"https://cloud.google.com/static/architecture/images/ac-page-icons/card_google_cloud_partner.svg\" height=\"200px\">\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "84f0f73a0f76"
      },
      "source": [
        "| | |\n",
        "|-|-|\n",
        "| Author(s) | [Vlad Kolesnikov](https://github.com/vladkol) |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ccd500ae19b5"
      },
      "source": [
        "## Overview"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tvgnzT1CKxrO"
      },
      "source": [
        "> [**Gemma 3**](https://ai.google.dev/gemma) is a new generation of open models developed by Google. It is a collection of lightweight, state-of-the-art open models built from the same research and technology that powers our Gemini 2.0 models. Gemma 3 comes in a range of sizes (1B, 4B, 12B and 27B), allowing you to choose the best model for your specific hardware and performance needs. Gemma 3 models are available through platforms like Google AI Studio, Kaggle, and Hugging Face.\n",
        "\n",
        "> **[Cloud Run](https://cloud.google.com/run)**:\n",
        "It's a serverless platform by Google Cloud for running containerized applications. It automatically scales and manages infrastructure, supporting various programming languages. Cloud Run now offers GPU acceleration for AI/ML workloads. With 30 seconds to the first token, Cloud Run is a perfect platform for serving lightweight models like Gemma.\n",
        "\n",
        "> **Note:** GPU support in Cloud Run is in preview. To use the GPU feature, you must request `Total Nvidia L4 GPU allocation, per project per region` quota under Cloud Run in the [Quotas and system limits page](https://cloud.google.com/run/quotas#increase).\n",
        "\n",
        "\n",
        "> **[Ollama](ollama.com)**: is an open-source tool for easily running and deploying large language models locally. It offers simple management and usage of LLMs on personal computers or servers.\n",
        "\n",
        "This notebook showcase how to deploy [Google Gemma 3](https://developers.googleblog.com/en/introducing-gemma3) in Cloud Run, with the objective to build a simple API for chat or RAG applications.\n",
        "\n",
        "By the end of this notebook, you will learn how to:\n",
        "\n",
        "1. Deploy Google Gemma 3 as an OpenAI-compatible API on Cloud Run using Ollama.\n",
        "2. Build a custom container with Ollama to deploy any Large Language Model (LLM) of your choice.\n",
        "3. Make requests to an API hosted on Cloud Run."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "61RBz8LLbxCR"
      },
      "source": [
        "## Get started"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aOiPjM5DEPhK"
      },
      "source": [
        "### Install Google Cloud SDK\n",
        "\n",
        "Make sure you Google Cloud SDK is installed (try running `gcloud version`) or [install it](https://cloud.google.com/sdk/docs/install) before executing this notebook.\n",
        "\n",
        "> If you are running in Colab or Vertex AI workbench, you have Google Cloud SDK installed."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HfAVa08RDDJB"
      },
      "source": [
        "### Choose a model, a project, and a region to host the model\n",
        "\n",
        "Choose a Gemma 3 model to use, a Google Cloud project to host your Cloud Run service, and a region to host it in.\n",
        "\n",
        "If you don't have a project yet:\n",
        "\n",
        "1. [Create a project](https://console.cloud.google.com/projectcreate) in the Google Cloud Console.\n",
        "2. Copy your `Project ID` from the project's [Settings page](https://console.cloud.google.com/iam-admin/settings).\n",
        "\n",
        "The project must have `Total Nvidia L4 GPU allocation, per project per region` quota allocated in the selected region.\n",
        "To make sure it's available, check Cloud Run in the [Quotas and system limits page](https://console.cloud.google.com/iam-admin/quotas)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "TV0pbqJHDDJB"
      },
      "outputs": [],
      "source": [
        "# { display-mode: \"form\", run: \"auto\" }\n",
        "\n",
        "MODEL = \"gemma3:4b\"  # @param {type:\"string\", isTemplate: true}\n",
        "\n",
        "PROJECT_ID = \"[your-project-id]\"  # @param {type:\"string\", isTemplate: true}\n",
        "REGION = \"us-central1\"  # @param {type:\"string\", isTemplate: true}\n",
        "\n",
        "if PROJECT_ID == \"[your-project-id]\" or not PROJECT_ID:\n",
        "    print(\"Please specify your project id in PROJECT_ID variable.\")\n",
        "    raise KeyboardInterrupt\n",
        "\n",
        "MODEL_NAME_ESCAPED = MODEL.translate(str.maketrans(\".:/\", \"---\"))\n",
        "SERVICE_NAME = f\"ollama--{MODEL_NAME_ESCAPED}\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dmWOrTJ3gx13"
      },
      "source": [
        "### Authenticate your Google Cloud account\n",
        "\n",
        "Depending on your Jupyter environment, you may have to manually authenticate. Run the cell below."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Xc8Jm1P3Y7fs"
      },
      "outputs": [],
      "source": [
        "!gcloud auth print-identity-token -q &> /dev/null || gcloud auth login --project=\"{PROJECT_ID}\" --update-adc --quiet"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l728UOEPDDJB"
      },
      "source": [
        "## Prepare container image\n",
        "\n",
        "First, let's create a Docker file for a container with the model embedded into it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "glBn9gPKDDJB"
      },
      "outputs": [],
      "source": [
        "%%writefile Dockerfile\n",
        "\n",
        "FROM ollama/ollama:0.6.0\n",
        "\n",
        "ARG MODEL\n",
        "\n",
        "# Set the model name\n",
        "ENV MODEL=$MODEL\n",
        "\n",
        "# Set the host and port to listen on\n",
        "ENV OLLAMA_HOST 0.0.0.0:8080\n",
        "\n",
        "# Set the directory to store model weight files\n",
        "ENV OLLAMA_MODELS /models\n",
        "\n",
        "# Reduce the verbosity of the logs\n",
        "ENV OLLAMA_DEBUG false\n",
        "\n",
        "# Do not unload model weights from the GPU\n",
        "ENV OLLAMA_KEEP_ALIVE -1\n",
        "\n",
        "# Start the ollama server and download the model weights\n",
        "RUN ollama serve & sleep 5 && ollama pull $MODEL\n",
        "\n",
        "# At startup time we start the server and run a dummy request\n",
        "# to request the model to be loaded in the GPU memory\n",
        "ENTRYPOINT [\"/bin/sh\"]\n",
        "CMD [\"-c\", \"ollama serve  & (ollama run $MODEL 'Say one word' &) && wait\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T4gS8ovMDDJB"
      },
      "source": [
        "Second, we create a Cloud Build file to use for building and pushing our container image."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1dV1_cMDDDJB"
      },
      "outputs": [],
      "source": [
        "%%writefile cloudbuild.yaml\n",
        "\n",
        "steps:\n",
        "- name: 'gcr.io/cloud-builders/docker'\n",
        "  id: build\n",
        "  entrypoint: 'bash'\n",
        "  args:\n",
        "    - -c\n",
        "    - |\n",
        "        docker buildx build --tag=${_IMAGE} --build-arg MODEL=${_MODEL} .\n",
        "\n",
        "images: [\"${_IMAGE}\"]\n",
        "\n",
        "substitutions:\n",
        "  _IMAGE: '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_AR_REPO_NAME}/${_SERVICE_NAME}'\n",
        "\n",
        "options:\n",
        "  dynamicSubstitutions: true\n",
        "  machineType: \"E2_HIGHCPU_32\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vbDiABJcDDJC"
      },
      "source": [
        "## Build Container Image and Deploy Cloud Run Service\n",
        "\n",
        "We are ready to build our container image and deploy Cloud Run service.\n",
        "\n",
        "The script below performs the following actions:\n",
        "\n",
        "* Enables necessary APIs.\n",
        "* Creates an Artifact Repository for the image.\n",
        "* Creates a Service Account for the service.\n",
        "* Submits a Cloud Build job to create and push the container image.\n",
        "* Deploys the Cloud Run service.\n",
        "\n",
        "> The script may take 10-45 minutes to finish.\n",
        "\n",
        "Note the following important flags in Cloud Build deployment command:\n",
        "\n",
        "* `--concurrency 4` is set to match the value of the environment variable `OLLAMA_NUM_PARALLEL`.\n",
        "* `--gpu 1` with `--gpu-type nvidia-l4` assigns 1 NVIDIA L4 GPU to every Cloud Run instance in the service.\n",
        "`--no-allow-authenticated` restricts unauthenticated access to the service.\n",
        "By keeping the service private, you can rely on Cloud Run's built-in [Identity and Access Management (IAM)](https://cloud.google.com/iam) authentication for service-to-service communication.\n",
        "* `--no-cpu-throttling` is required for enabling GPU.\n",
        "* `--service-account` the service identity of the service.\n",
        "* `--max-instances` sets maximum number of instances of the service.\n",
        "It has to be equal to or lower than your project's NVIDIA L4 GPU (`Total Nvidia L4 GPU allocation, per project per region`) quota.\n",
        "\n",
        "For optimal GPU utilization, increase `--concurrency`, keeping it within twice the value of `OLLAMA_NUM_PARALLEL`.\n",
        "While this leads to request queuing in Ollama, it can help improve utilization:\n",
        "Ollama instances can immediately process requests from their queue, and the queues help absorb traffic spikes."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "TXg7IYU1DDJC"
      },
      "outputs": [],
      "source": [
        "%%writefile deploy.sh\n",
        "\n",
        "PROJECT_ID=$1\n",
        "REGION=$2\n",
        "MODEL_ID=\"${3}\"\n",
        "SERVICE_NAME=\"${4}\"\n",
        "AR_REPO_NAME=\"ollama-repo\"\n",
        "SERVICE_ACCOUNT=\"ollama-cloud-run-sa\"\n",
        "SERVICE_ACCOUNT_ADDRESS=\"${SERVICE_ACCOUNT}@$PROJECT_ID.iam.gserviceaccount.com\"\n",
        "MAX_INSTANCES=1 # Adjust this value to match your Cloud Run L4 GPU quota (\"Total Nvidia L4 GPU allocation, per project per region\", NvidiaL4GpuAllocPerProjectRegion, run.googleapis.com/nvidia_l4_gpu_allocation)\n",
        "\n",
        "echo \"Enabling APIs in project ${PROJECT_ID}.\"\n",
        "gcloud services enable run.googleapis.com \\\n",
        "    cloudbuild.googleapis.com \\\n",
        "    artifactregistry.googleapis.com \\\n",
        "    --project ${PROJECT_ID} \\\n",
        "    --quiet\n",
        "\n",
        "set -e\n",
        "\n",
        "# Creating the service account if doesn't exist.\n",
        "sa_list=$(gcloud iam service-accounts list --quiet --format 'value(email)' --project $PROJECT_ID --filter=email:$SERVICE_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com 2>/dev/null)\n",
        "if [ -z \"${sa_list}\" ]; then\n",
        "    echo \"Creating Service Account ${SERVICE_ACCOUNT}.\"\n",
        "    gcloud iam service-accounts create $SERVICE_ACCOUNT \\\n",
        "        --project ${PROJECT_ID} \\\n",
        "        --display-name=\"${SERVICE_ACCOUNT} - Cloud Run Service Account\"\n",
        "fi\n",
        "\n",
        "# Creating the Artifacts Repository if doesn't exist\n",
        "repo_list=$(gcloud artifacts repositories list --format 'value(name)' --filter=name=\"projects/${PROJECT_ID}/locations/${REGION}/repositories/${AR_REPO_NAME}\" --project ${PROJECT_ID} --quiet --location ${REGION} 2>/dev/null)\n",
        "if [ -z \"${repo_list}\" ]; then\n",
        "    echo \"Creating Artifact Registry ${AR_REPO_NAME}.\"\n",
        "    gcloud artifacts repositories create $AR_REPO_NAME \\\n",
        "    --repository-format docker \\\n",
        "    --location ${REGION} \\\n",
        "    --project=${PROJECT_ID}\n",
        "fi\n",
        "\n",
        "echo \"Building container image.\"\n",
        "gcloud builds submit --config=cloudbuild.yaml --project=${PROJECT_ID} . \\\n",
        "    --suppress-logs \\\n",
        "    --substitutions \\\n",
        "  _AR_REPO_NAME=$AR_REPO_NAME,_REGION=$REGION,_SERVICE_NAME=$SERVICE_NAME,_MODEL=$MODEL_ID\n",
        "rm -f cloudbuild.yaml\n",
        "rm -f Dockerfile\n",
        "\n",
        "echo \"Deploying Service ${SERVICE_NAME}.\"\n",
        "gcloud beta run deploy $SERVICE_NAME \\\n",
        "    --project=${PROJECT_ID} \\\n",
        "    --image=${REGION}-docker.pkg.dev/$PROJECT_ID/$AR_REPO_NAME/$SERVICE_NAME \\\n",
        "    --service-account $SERVICE_ACCOUNT_ADDRESS \\\n",
        "    --cpu=8 \\\n",
        "    --memory=32Gi \\\n",
        "    --gpu=1 --gpu-type=nvidia-l4 \\\n",
        "    --concurrency 4 \\\n",
        "    --set-env-vars OLLAMA_NUM_PARALLEL=4 \\\n",
        "    --region ${REGION} \\\n",
        "    --no-allow-unauthenticated \\\n",
        "    --max-instances ${MAX_INSTANCES} \\\n",
        "    --no-cpu-throttling \\\n",
        "    --timeout 1h\n",
        "\n",
        "SERVICE_URL=$(gcloud run services describe ${SERVICE_NAME} --project=${PROJECT_ID} --region $REGION --format 'value(status.url)' --quiet)\n",
        "echo \"✅ Success!\"\n",
        "echo \"🚀 Service URL: ${SERVICE_URL}\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "e6L2dVGyOAxB"
      },
      "outputs": [],
      "source": [
        "!/bin/bash ./deploy.sh \"{PROJECT_ID}\" \"{REGION}\" \"{MODEL}\" \"{SERVICE_NAME}\" && rm -f ./deploy.sh"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dgaJ62rmDDJC"
      },
      "source": [
        "## Test the deployed service\n",
        "\n",
        "Now, let's test the service you deployed.\n",
        "\n",
        "First, simply by using `cURL`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "iX7LmWwGDDJC"
      },
      "outputs": [],
      "source": [
        "%%bash -s $MODEL $SERVICE_NAME $PROJECT_ID $REGION\n",
        "\n",
        "PROMPT=\"Hello!\"\n",
        "SERVICE_URL=$(gcloud run services describe ${2} --project ${3} --region ${4} --format 'value(status.url)' --quiet)\n",
        "AUTH_TOKEN=$(gcloud auth print-identity-token -q)\n",
        "\n",
        "curl -s -X POST \"${SERVICE_URL}/api/generate\" \\\n",
        "-H \"Authorization: Bearer ${AUTH_TOKEN}\" \\\n",
        "-H \"Content-Type: application/json\" \\\n",
        "-d '{ \"model\": \"'${1}'\", \"prompt\": \"'${PROMPT}'\", \"max_tokens\": 100, \"stream\": false}'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "63oQqBNmDDJC"
      },
      "source": [
        "### Ollama Python Library\n",
        "\n",
        "You can also use Ollama Python Library to make requests to the service you deployed."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "h8G3te5pDDJC"
      },
      "outputs": [],
      "source": [
        "# Install Ollama Python Library\n",
        "%pip install ollama -q"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "X-2TbV6tDDJC"
      },
      "outputs": [],
      "source": [
        "import subprocess\n",
        "\n",
        "from ollama import Client\n",
        "\n",
        "identity_token = (\n",
        "    subprocess.check_output(\"gcloud auth print-identity-token -q\", shell=True)\n",
        "    .decode()\n",
        "    .strip()\n",
        ")\n",
        "service_url = (\n",
        "    subprocess.check_output(\n",
        "        (\n",
        "            \"gcloud run services describe \"\n",
        "            f\"{SERVICE_NAME} --project={PROJECT_ID} \"\n",
        "            f\"--region={REGION} \"\n",
        "            \"--format='value(status.url)' -q\"\n",
        "        ),\n",
        "        shell=True,\n",
        "    )\n",
        "    .decode()\n",
        "    .strip()\n",
        ")\n",
        "client = Client(host=service_url, headers={\"Authorization\": f\"Bearer {identity_token}\"})\n",
        "stream = client.chat(\n",
        "    model=MODEL,\n",
        "    messages=[{\"role\": \"user\", \"content\": \"Why is the sky blue?\"}],\n",
        "    stream=True,\n",
        ")\n",
        "for chunk in stream:\n",
        "    print(chunk[\"message\"][\"content\"], end=\"\", flush=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FFL2V_ClDDJD"
      },
      "source": [
        "## Conclusion\n",
        "Congratulations! 💎 Now you know how to deploy Gemma 3 with Ollama to Cloud Run powered by a GPU!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f6f17f9aff65"
      },
      "source": [
        "## Cleaning up"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "s1blF2ziDDJD"
      },
      "source": [
        "To delete the Cloud Run service you created, you can uncomment and run the following cell."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VbhAz7-9DDJD"
      },
      "outputs": [],
      "source": [
        "# !gcloud run services delete $SERVICE_NAME --project $PROJECT_ID --region $LOCATION --quiet"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "cloud_run_ollama_gemma3_inference.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
